A closed Candidatus Odinarchaeum chromosome exposes Asgard archaeal viruses

Asgard archaea have recently been identified as the closest archaeal relatives of eukaryotes. Their ecology, and particularly their virome, remain enigmatic. We reassembled and closed the chromosome of Candidatus Odinarchaeum yellowstonii LCB_4, through long-range PCR, revealing CRISPR spacers targeting viral contigs. We found related viruses in the genomes of diverse prokaryotes from geothermal environments, including other Asgard archaea. These viruses open research avenues into the ecology and evolution of Asgard archaea.

The LCB_4 genome contains a complex CRISPR-Cas gene system (Fig. 1), including neighbouring type I-A and type III-D Cas gene clusters, separated by a 6.1-kbp-long type I-A CRISPR array and further followed by another 2.7-kbp-long type I-A CRISPR array, with a total of 142 CRISPR 35-42 bp spacers across both arrays. Nine of these spacers targeted (with 100% identity and query coverage) 4 putative mobile element contigs obtained in the same assembly that were not part of the closed chromosome ( Fig. 1 and Supplementary Tables 1 and 2), all of which had Ca. Odinarchaeum predicted as the host by WIsH 8 . In addition, we identified multiple poorer matches from spacers using SpacePHARER 9 (Fig. 1), possibly representing interactions with diverged relatives of these elements. Two of these contigs contained genes encoding common mobile element proteins, such as restriction endonucleases and integrases, but did not contain any obvious viral signature genes (Supplementary Table 3 Table 3). This specific protein was previously found in a study of the double jelly-roll MCP family and tentatively named an 'Odin group' of sequences given this protein's origin in the same metagenome as Ca. Odinarchaeum LCB_4 (ref. 10 ). The complete recovery of LCB_4's CRISPR arrays allowed us to confirm that this circular contig indeed represents a virus associated with Ca. Odinarchaeum (Supplementary Table 4), for which we suggest the name 'Huginn virus' , in reference to one of two ravens of Odin, Huginn ('thought').
Furthermore, 3 spacers yielded full-coverage, identical matches (and a further 3 spacers with 1 mismatch) against a 12.7-kbp-long contig recovered by the Ca. Odinarchaeum LCB_4 reassembly (Fig. 1). All three hits targeted an open reading frame encoding a protein-primed family B DNA Polymerase (pPolB), a gene frequently observed in archaeal viruses. Further inspection of this contig revealed genes encoding a zinc-ribbon protein and a His1-like family MCP (Extended Data Fig. 3b-d and Supplementary Table 3), conserved in spindle-shaped viruses 11 . This contig had a coverage over 3 times higher than that of the chromosome, suggestive of viral DNA replication, and was flanked by approximately 80-nucleotide-long terminal inverted repeats, a typical signature of viruses with linear double-stranded DNA genomes replicated by pPolBs 12 . Thus, this contig represents a complete Asgard archaeal viral genome for which we suggest the name 'Muninn virus' (Supplementary Table 4), in relation to the second raven of Odin, Muninn ('memory').
We further queried the pPolB sequence from the Muninn virus genome through phylogenetic analysis, finding that it is closely related to a homologue in Sulfolobus ellipsoid virus 1 (SEV1) 13 ( Fig. 2a and Supplementary Fig. 1), recently isolated from a Costa Rican hot spring. No other genes were shared between Muninn virus and SEV1, which is indicative of recent horizontal transfer of polB in at least one of these viruses. Interestingly, other close homologues included multiple sequences that were likewise obtained from hot springs or hydrothermal vents (Fig. 2a). Two of these hits were part of an Asgard archaeal MAG (QZMA23B3), and a third pPolB homologue (HGY28086.1) belonged to a MAG (SpSt-845) originally classified as Bathyarchaeota. A phylogenomic analysis indicated that QZMA23B3 belonged to the recently described Asgard archaeal class Jordarchaeia 6 and that SpSt-845 in fact belonged to the Nitrososphaeria (Extended Data Fig. 4). Closer inspection of the Nitrososphaerial MAG revealed 2 additional pPolB sequences from the same MAG that were highly similar (>80% identity) to HGY28086.1. The five pPolB homologues were encoded in contigs containing Sulfolobus islandicus rod-shaped virus 2 (SIRV2) family MCP genes (Fig. 2b, Extended Data Fig. 3e and Supplementary  Table 3), exclusive to archaeal filamentous viruses with linear double-stranded DNA genomes and classified into the realm Adnaviria 14 . Both the Jordarchaeia and Nitrososphaeria contigs displayed high conservation in synteny and protein sequences, indicating high contig completeness and recent diversification (Fig. 2b). Notably, none of the known archaeal viruses with SIRV2 family MCPs encodes its own pPolB, suggesting that the group identified herein represents a previously undescribed archaeal virus family. However, while we detected CRISPR arrays in the MAGs where these viral contigs were identified, we could not find accurate spacer matches (query coverage >90%, identity >90%) to these viral sequences; therefore, the identity of the hosts of these thermophilic viruses is unclear.
The pPolB phylogeny further suggests that a clade of viral sequences found in MAGs from mesophiles evolved from a likely thermophile-infecting ancestor. While none of the mentioned mobile elements share other proteins in common with Muninn virus, a more distant relative of the Muninn virus pPolB sequence was found in a contig from the same LCB_4 assembly. Like Muninn virus, this sequence encoded a His1-like MCP and a gene encoding a transmembrane protein of unknown function (Fig. 2c). These two genes surrounded another gene encoding a relatively long protein (>550 amino acid residues) with multiple transmembrane helices and complex predicted structures (Extended Data Fig. 3f), with no detectable similarity but possibly related functions. We further queried the His1-like MCPs for detectable homologues, finding only a small Lokiarchaeial contig encoding two His1-like MCPs that are 83-85% identical to the Muninn virus MCP, plus a phylogenetically distant pPolB ( Supplementary Fig. 1) and a protein of unknown function (Fig. 2c).
The CRISPR-Cas system of Ca. Odinarchaeum yellowstonii LCB_4 is likely its primary antiviral defence system. We could find no homologues for DISARM 15 or other recently discovered antiviral systems 16,17 in its genome. The retention of many CRISPR spacers against these mobile elements is significant and indicates coevolutionary dynamics with viruses from multiple families.
Two additional studies identifying Asgard archaeal viruses accompany ours. Rambo et al. 18 described viruses belonging to the Caudoviricetes class, while Medvedeva et al. 19 described three groups of viruses, of which two, skuldviruses and wyrdviruses, are

Huginn virus
T r a n s c r ip t io n a l r e g u la t o r R d g C A s p B H I T r a n s c r ip t io n in it ia t io n fa c t o r E A d e n in e D N A m e t h y lt r a n s fe r a s e R e s t r ic t io n e n z y m e P F 1 3 1 5 6 H N H e n d o n u c le a s e I n t e g r a s e G ly c o s id e h y d r o la s e

Muninn virus
T r a n s c r i p t i o n a l r e g u l a t o r D U distantly related to the Huginn and Muninn viruses, respectively, and are associated with Lokiarchaeal hosts. The sets of viruses found by these three studies thus complement each other. Our findings highlight the benefits of improving the quality of Asgard archaeal genomes. The discovery of viruses of thermophilic Asgard archaea expands our limited knowledge of the Asgard archaeal mobilome [18][19][20] and promises exciting advances in the study of the ecology, physiology and evolution of the closest archaeal relatives of eukaryotes.

Methods
Ca. Odinarchaeon LCB_4 genome reassembly. To reassemble the Ca. Odinarchaeon LCB_4 genome ( Supplementary Fig. 1a), its corresponding Illumina reads 21 (BioSample SAMN04386028) were mapped against Asgard archaeal MAGs using Minimap2 (ref. 22 ) v.2.2.17. Mapped reads were extracted and assembled with Unicycler 23 v.0.4.4. Unicycler tested k-mer lengths ranging from 27 to 127; the latter was chosen to perform an assembly with default parameters. This assembly obtained a 1.406 Mbp contig, which was not predicted as circular despite both of its contig boundaries ending in type I-A CRISPR arrays ( Supplementary Fig. 1b). Additional short (<13 kbp) contigs were not considered part of the main chromosome because they represented mobile elements (with signatures such as differing coverage, circularity, CRISPR spacer hits and/or presence of typical mobile element genes), ribosomal RNA genes from other organisms or CRISPR arrays (the latter two were expected due to the conservation of rRNA gene sequences and CRISPR repeats). After removing these contigs, only 1 additional contig of 10.6 kbp containing type I-A Cas genes remained. Given that the 1.406 Mbp contig ended in type I-A CRISPR arrays, we hypothesized that these two contigs could represent the entire circular chromosome of Ca. Odinarchaeum LCB_4. In parallel, we assembled the Illumina reads with MEGAHIT 24 v.1.1.3 (--k-min 57 --k-max 147 --k-step 12). While highly fractionated, this assembly found an alternative solution for the sequences involved in the contig borders of the previous assembly. Particularly, inspecting the assembly performed with k-mer 141 we observed that the type I-A Cas genes were surrounded by 2 separate CRISPR arrays. Moreover, four consecutive spacers in the innermost side of one of the CRISPR arrays in this assembly were identical to the outermost spacers of the CRISPR array present at the border of the 1.406 Mbp contig in the Unicycler assembly ( Supplementary Fig. 1b). These results suggested a specific disposition for the two aforementioned contigs.
Long-range PCR and Nanopore sequencing. Four regions were selected for long-range PCR: two contig gaps, corresponding to CRISPR arrays, and two control regions spanning approximately 5 kbp of the rRNA operon and approximately 10 kbp of a ribosomal protein gene cluster (Supplementary Table 2). Primers were designed using OligoEvaluator (http://www.oligoevaluator. com/OligoCalcServlet) (Sigma-Aldrich) and synthesized by Integrated DNA Technologies. Multiple displacement amplification-amplified environmental DNA isolated from the Lower Culex Basin at Yellowstone National Park 21 was then amplified with Herculase polymerase (Agilent Technologies). Amplification of control and gap regions was then performed following the parameters shown in Supplementary Tables 5 and 6. Products were separated on a 0.8% agarose gel in 1× Tris-Borate-EDTA buffer stained with SYBR-Gold and purified using a QIAGEN Spin purification kit according to the manufacturer's instructions. Purified PCR fragments were pooled and used to construct a library with the SQK-LSK109 ligation kit. Sequencing was performed on an Oxford Nanopore MinION Mk1C sequencer using an R9.4.1 flow cell. Raw sequence data were basecalled using Guppy v.4.2.2. Reads were separated in 2 bins at 3-9 kbp (subsampled to 30×) and 9-12 kb and processed to obtain consensus sequences using Decona 25 v.0.1.2 (-c 0.85 -w 6 -i -n 25 -M -r). Both control regions, comprising the rRNA and ribosomal protein operons, were 100% identical to the corresponding nucleotide sequences of the published assembly.
Hybrid assembly. Reads were filtered using NanoFilt v.2.6.0 with the options "-q 10 -l 1000". We used these filtered Nanopore reads and the mapped Illumina reads to perform a hybrid assembly with Unicycler v.0.4.4, which resolved both the main chromosomal contig and a viral contig (Huginn virus) as circular ( Supplementary Fig. 1d,e). Read mapping was performed using Bowtie 2 (ref. 26 ) v.2.3.5.1 for Illumina reads and minimap2 (ref. 22 ) v.2.17.r941 for Nanopore reads. A local cumulative GC skew minimum ( Supplementary Fig. 1f), together with low R-Y (purine minus pyrimidine), M-K (amino minus keto) and cumulative AT skew values, was selected as a potential replication origin; the circular contig was permutated to set this position as nucleotide +1.
Annotation. CRISPR arrays were detected and classified using CRISPRDetect 27 v.2.4 and Cas genes were detected and classified through CRISPRcasIdentifier 28 v.1.1.0. Spacer similarity searches were assessed against IMG/VR 29 v.3 (release 5.1) and against all available databases on the CRISPRTarget 30 webserver on 26 January 2022. Local spacer searches were performed using BlastN 31 v.2.10.0+ (-task blastn-short) against the Ca. Odinarchaeum assembly, its source metagenome and the nucleotide National Center for Biotechnology Information (NCBI) database. SpacePHARER 9 v5-c2e680a was used to search against the Ca. Odinarchaeum   Supplementary Fig. 1. b,c, Comparison between the viral contigs of Jordarchaeia QZMA23B3 and Nitrososphaeria SpSt-845 (b) and of Muninn virus and viral contigs in the bins of Ca. Odinarchaeum LCB_4 and Lokiarchaeia E29_bin63 (c). Gene map similarity lines represent reciprocal BlastP hits with an E-value lower than 1 × 10 −5 and percentage identity as shown in the upper-right legend.
assembly and the 2018 GenBank phage and eukaryotic virus databases facilitated by the software, using as control sequences the eukaryotic virus database (with reversed sequences when using this database as target). WIsH 8 v.1.1 was used to predict host sequences of mobile element contigs, using Ca. Odinarchaeum and all archaeal representative genome sequences from the Genome Taxonomy Database (GTDB) 32 release 202. VirSorter2 (ref. 33  Phylogenetics. Reference pPolB sequences were obtained from Kim et al. 44 and used for Psi-blast 45 v.2.10.0+ against the NR v5 (as of 10 February 2021) database. Sequences with over 70% similarity were removed with CD-Hit 46 v.4.7. The remaining sequences were aligned with Mafft-linsi 47 v.7.450; columns with over 50% gaps were removed using trimAl 48 v.1.4.rev22. Additionally, sequences with over 50% gaps in the trimmed alignment were removed. Maximum-likelihood trees were reconstructed using IQ-TREE 49 v.2.0-rc1 and its implementation of ModelFinder 50 with all combinations of the empirical models LG, JTT, WAG and Q.pfam with the site class mixtures (none, C20, C40, C60), rate heterogeneity (none, G4 and R4) and frequency (none, F) parameters. Using the obtained tree as a guide, a posterior mean site frequency (PMSF) 51 approximation of the selected model (Q.pfam + C60 + R4 + F) was used to reconstruct a tree with 100 non-parametric bootstrap pseudo-replicates, which was then interpreted both as the standard Felsenstein bootstrap proportion (FBP) and as transfer bootstrap expectation (TBE) 52 . Double jelly-roll and His1-like MCPs were separately searched with Psiblast using the alignments of query sequences and references from Yutin et al. 10

Data availability
Raw Nanopore amplicon reads and the complete Ca. Odinarchaeum LCB_4 assembly are available at the NCBI under BioProject no. PRJNA319486. Additional data and supporting alignments and trees can be found at https://doi.org/10.6084/ m9.figshare.19131413 (ref. 56 ). Source data are provided with this paper.

Code availability
No custom code was required for the analyses in this manuscript. Last updated by author(s): Mar 25, 2022 Reporting Summary Nature Portfolio wishes to improve the reproducibility of the work that we publish. This form provides structure for consistency and transparency in reporting. For further information on Nature Portfolio policies, see our Editorial Policies and the Editorial Policy Checklist.

Statistics
For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section.

n/a Confirmed
The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly The statistical test(s) used AND whether they are one-or two-sided Only common tests should be described solely by name; describe more complex techniques in the Methods section.
A description of all covariates tested A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted

Software and code
Policy information about availability of computer code Data collection All software (including versions) used to analyze data has been clearly described in the methods section of the submitted manuscript. No custom codes were used.

Data analysis
See above.
For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors and reviewers. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Portfolio guidelines for submitting code & software for further information.

Data
Policy information about availability of data All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: -Accession codes, unique identifiers, or web links for publicly available datasets -A description of any restrictions on data availability -For clinical datasets or third party data, please ensure that the statement adheres to our policy Raw Nanopore amplicon reads and complete Ca. Odinarchaeum LCB_4 assembly are available at NCBI under BioProject PRJNA319486. Additional data and supporting alignments and trees can be found in Figshare: project 122109.v1.